Search CORE

7 research outputs found

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Author: Lin Junyang
Lin Runji
Lu Keming
Yuan Hongyi
Yuan Zheng
Zhou Chang
Zhou Jingren
Publication venue
Publication date: 14/11/2023
Field of study

The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods

arXiv.org e-Print Archive

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

Author: Lin Junyang
Lin Runji
Lu Keming
Tan Chuanqi
Yuan Hongyi
Yuan Zheng
Zhou Chang
Zhou Jingren
Publication venue
Publication date: 15/08/2023
Field of study

Foundation language models obtain the instruction-following ability through supervised fine-tuning (SFT). Diversity and complexity are considered critical factors of a successful SFT dataset, while their definitions remain obscure and lack quantitative analyses. In this work, we propose InsTag, an open-set fine-grained tagger, to tag samples within SFT datasets based on semantics and intentions and define instruction diversity and complexity regarding tags. We obtain 6.6K tags to describe comprehensive user queries. Then we analyze popular open-sourced SFT datasets and find that the model ability grows with more diverse and complex data. Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data. The resulting models, TagLM, outperform open-source models based on considerably larger SFT data evaluated by MT-Bench, echoing the importance of query diversity and complexity. We open-source InsTag in https://github.com/OFA-Sys/InsTag

arXiv.org e-Print Archive

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Author: Kuba Jakub Grudzien
Lin Runji
Wang Jun
Wen Muning
Wen Ying
Yang Yaodong
Zhang Weinan
Publication venue
Publication date: 01/01/2022
Field of study

Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that MAT is an excellent few-short learner on unseen tasks regardless of changes in the number of agents. See our project page at https://sites.google.com/view/multi-agent-transformer

arXiv.org e-Print Archive

UCL Discovery

Qwen Technical Report

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.Comment: 59 pages, 5 figure

arXiv.org e-Print Archive

Switchable Glass Enabled Contextualization for a Cyber-Physical Safe and Interactive Spatial Augmented Reality PCBA Manufacturing Inspection System

Author: Bimber
Chyi-Yeu Lin
Hausner
Joel Murithi Runji
Roo
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

Two-stage optimal scheduling of air conditioning resources with high photovoltaic penetrations

Author: Akhavan-Rezai
Alhaider
Chaudhary
Chun Sing Lai
Conejo
Dongxiao Wang
Fengji
Gagrica
Graham
Hui
Jinxiao Wei
Lai
Lai
Li
Li
Lin
Liu
Loi Lei Lai
Luo
MOSEK
NASA
Pamparana
Qiu
Renewables
Runji Wu
Song
Wang
Wang
Wang
Wanli Wu
Wu
Xu
Xuecong Li
Xueqing Wu
Yan
Yang
Yi Xu
Zeraati
Zhang
Zhang
Zhu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

AR/MR Remote Collaboration on Physical Tasks: A Review

Author: Adcock
Adcock
Adcock
Adcock
Akkil
Akkil
Alam
Alem
Amores
Andersen
Andersen
Andrist
Antn
Anton
Anton
Arnaldi
Aschenbrenner
Avellino
Azuma
Barakonyi
Beck
Bekele
Benjamin
Billinghurst
Billinghurst
Billinghurst
Billinghurst
Billinghurst
Billinghurst
Borrero
Bottecchia
Bottecchia
Boulanger
Brennan
Bucioli
Cai
Carbone
Chang
Chang
Chen
Chenechal
Choi
Choi
Cidota
Cidota
Coldefy
D'Angelo
D'Angelo
D'Angelo
Datcu
Dey
Domova
Du
Elvezio
Ereso
Escolano
Fairchild
Fakourfar
Feick
Feick
Fels
Ferrise
Funk
Fussell
Fussell
Fussell
Fussell
Galambos
Gauglitz
Gauglitz
Gauglitz
Gauglitz
Genest
Gergle
Goto
Gravetter
Gross
Gunn
Gupta
Gupta
Gurevich
Gurevich
Gutwin
G¨unther
Hao
Hart
Higuch
Higuchi
Hiura
Ho
Huang
Huang
Huang
Huang
Huang
Huang
Huang
Huang
Huang
Ishii
Ishii
Iwai
Izadi
Jo
Jo
Jo
Johnson
Jones
Jones
Kangas
Kasahara
Kasahara
Kasahara
Kasahara
Kikkawa
Kim
Kim
Kim
Kim
Kim
Kim
Kim
Kim
Kim
Kirk
Kirk
Kirk
Kirk
Kirk
Kleiber
Kleiber
Koh
Kraut
Kraut
Kritzler
Kuechler
Kurata
Kurillo
Kurillo
Kuzuoka
Kuzuoka
Kuzuoka
Kwon
Lanir
Lazar
Lee
Lee
Lee
Lee
Lee
Lee
Lehment
Lei
Lei
Leithinger
Li
Li
Li
Licoppe
Lin
Lincoln
Liu
Loescher
Lukosch
Lukosch
Machino
Maimone
Maimone
Maimone
Mann
Masai
Masoni
Masood
Meola
Milgram
Monk
Mourtzis
Mueller
Muller
Nahm
Nakazato
Neider
Neiva
Nuernberger
O'Neill
Oda
Oda
Ogawa
Okajima
Otsuki
Ou
Oyekan
Oyekoya
Oztemel
Palmarini
Palmer
Peer
Pejoska-Laajola
Pejsa
Piumsomboon
Piumsomboon
Piumsomboon
Piumsomboon
Piumsomboon
Piumsomboon
Piumsomboon
Pizaro
Prabhu
Ranatunga
Ranjan
Raskar
Re
Rice
Robert
Romy
Roo
Rouibah
Runji
Sakata
Scurati
Shenai
Siew
Singhal
Sirilak
Siu
Sodhi
Sra
Stafford
Steptoe
Sun
Sun
Suzuki
Tait
Tait
Takao
Takao
Tang
Tecchia
Vera
V´avra
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Webel
Webster
Weigel
Wesugi
Wickey
Xavier
Xu
Yarosh
Young
Zhang
Zhou
Zillner
Zillner
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref